-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update GRROsqueryCollector to use threadpoolexecutor #696
Conversation
results_container = containers.OsqueryResult( | ||
name=name, | ||
description=description, | ||
query=query, | ||
hostname=hostname, | ||
data_frame=pd.DataFrame(), | ||
flow_identifier=flow_identifier, | ||
client_identifier=client_identifier) | ||
self.state.StoreContainer(results_container) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point in storing an empty dataframe here? Wouldn't it be better just to not store any container?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have it as an empty dataframe as the corresponding container attribute is currently not optional. It also simplifies the logic in downstream processing of the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was more "why add a container at all" if the dataframe is going to be empty anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops my bad, missed the second question. My rationale doing it this way was no result (i.e. empty data) is still a result and is useful feedback downstream to let the module/user know that the query was successful and there was no result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes sense, thanks!
Co-authored-by: Thomas Chopitea <[email protected]>
results_container = containers.OsqueryResult( | ||
name=name, | ||
description=description, | ||
query=query, | ||
hostname=hostname, | ||
data_frame=pd.DataFrame(), | ||
flow_identifier=flow_identifier, | ||
client_identifier=client_identifier) | ||
self.state.StoreContainer(results_container) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that makes sense, thanks!
As suggested, using threadpoolexecutor to run osquery in multiple threads.